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Abstract — Compressive sensing (CS) is an alternative to 
Shannon/Nyquist sampling for the acquisition of sparse or com- 
pressible signals that can be well approximated by just K <C N 
elements from an iV-dimensional basis. Instead of taking periodic 
samples, CS measures inner products with M < N random 
vectors and then recovers the signal via a sparsity-seeking 
optimization or greedy algorithm. Standard CS dictates that 
robust signal recovery is possible from M = 0(Klog(N/K)) 
measurements. It is possible to substantially decrease M without 
sacrificing robustness by leveraging more realistic signal models 
that go beyond simple sparsity and compressibility by including 
structural dependencies between the values and locations of the 
signal coefficients. This paper introduces a model-based CS the- 
ory that parallels the conventional theory and provides concrete 
guidelines on how to create model-based recovery algorithms with 
provable performance guarantees. A highlight is the introduction 
of a new class of structured compressible signals along with a 
new sufficient condition for robust structured compressible signal 
recovery that we dub the restricted amplification property, which 
is the natural counterpart to the restricted isometry property 
of conventional CS. Two examples integrate two relevant signal 
models — wavelet trees and block sparsity — into two state- 
of-the-art CS recovery algorithms and prove that they offer 
robust recovery from just M = O (K) measurements. Extensive 
numerical simulations demonstrate the validity and applicability 
of our new theory and algorithms. 

Index Terms — Compressive sensing, sparsity, signal model, 
union of subspaces, wavelet tree, block sparsity 



I. Introduction 

WE ARE in the midst of a digital revolution that is 
enabling the development and deployment of new 
sensors and sensing systems with ever increasing fidelity and 
resolution. The theoretical foundation is the Shannon/Nyquist 
sampling theorem, which states that a signal's information is 
preserved if it is uniformly sampled at a rate at least two 
times faster than its Fourier bandwidth. Unfortunately, in many 
important and emerging applications, the resulting Nyquist rate 
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can be so high that we end up with too many samples and 
must compress in order to store or transmit them. In other 
applications the cost of signal acquisition is prohibitive, either 
because of a high cost per sample, or because state-of-the-art 
samplers cannot achieve the high sampling rates required by 
Shannon/Nyquist. Examples include radar imaging and exotic 
imaging modalities outside visible wavelengths. 

Transform compression systems reduce the effective di- 
mensionality of an TV-dimensional signal x by re-representing 
it in terms of a sparse or compressible set of coefficients a in 
a basis expansion x = 'fa, with W an N x N basis matrix. By 
sparse we mean that only K -C N of the coefficients a are 
nonzero and need to be stored or transmitted. By compressible 
we mean that the coefficients a, when sorted, decay rapidly 
enough to zero that a can be well-approximated as TV-sparse. 
The sparsity and compressibility properties are pervasive in 
many signal classes of interest. For example, smooth signals 
and images are compressible in the Fourier basis, while 
piecewise smooth signals and images are compressible in a 
wavelet basis [1]; the JPEG and JPEG2000 standards are 
examples of practical transform compression systems based 
on these bases. 

Compressive sensing (CS) provides an alternative to 
Shannon/Nyquist sampling when the signal under acquisition 
is known to be sparse or compressible [2-4]. In CS, we 
measure not periodic signal samples but rather inner products 
with M <C N measurement vectors. In matrix notation, the 
measurements y = &x = Q^a, where the rows of the 
M x TV matrix $ contain the measurement vectors. While 
the matrix is rank deficient, and hence loses information 
in general, it can be shown to preserve the information in 
sparse and compressible signals if it satisfies the so-called 
restricted isometry property (RIP) [3]. Intriguingly, a large 
class of random matrices have the RIP with high probability. 
To recover the signal from the compressive measurements y, 
we search for the sparsest coefficient vector a that agrees 
with the measurements. To date, research in CS has focused 
primarily on reducing both the number of measurements M 
(as a function of TV and K) and on increasing the robustness 
and reducing the computational complexity of the recovery 
algorithm. Today's state-of-the-art CS systems can robustly 
recover TV-sparse and compressible signals from just M = 
O (TV \og(N/K )) noisy measurements using polynomial-time 
optimization solvers or greedy algorithms. 

While this represents significant progress from Nyquist- 
rate sampling, our contention in this paper is that it is possible 
to do even better by more fully leveraging concepts from 
state-of-the-art signal compression and processing algorithms. 
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In many such algorithms, the key ingredient is a more re- 
alistic structured sparsity model that goes beyond simple 
sparsity by codifying the inter-dependency structure among 
the signal coefficients a. 1 For instance, modern wavelet image 
coders exploit not only the fact that most of the wavelet 
coefficients of a natural image are small but also the fact 
that the values and locations of the large coefficients have 
a particular structure. Coding the coefficients according to a 
structured sparsity model enables these algorithms to compress 
images close to the maximum amount possible - significantly 
better than a naive coder that just processes each large co- 
efficient independently. We have previously developed a new 
CS recovery algorithm that promotes structure in the sparse 
representation by tailoring the recovered signal according to 
a sparsity-promoting probabilistic model, such as an Ising 
graphical model [5]. Such probabilistic models favor certain 
configurations for the magnitudes and indices of the significant 
coefficients of the signal. 

In this paper, we expand on this concept by introduc- 
ing a model-based CS theory that parallels the conventional 
theory and provides concrete guidelines on how to create 
structured signal recovery algorithms with provable perfor- 
mance guarantees. By reducing the number of degrees of 
freedom of a sparse/compressible signal by permitting only 
certain configurations of the large and zero/small coefficients, 
structured sparsity models provide two immediate benefits to 
CS. First, they enable us to reduce, in some cases significantly, 
the number of measurements M required to stably recover a 
signal. Second, during signal recovery, they enable us to better 
differentiate true signal information from recovery artifacts, 
which leads to a more robust recovery. 

To precisely quantify the benefits of model-based CS, 
we introduce and study several new theoretical concepts that 
could be of more general interest. We begin with structured 
sparsity models for if -sparse signals and make precise how 
the structure reduces the number of potential sparse signal 
supports in a. Then using the model-based restricted isometry 
property from [6,7], we prove that such structured sparse 
signals can be robustly recovered from noisy compressive 
measurements. Moreover, we quantify the required number of 
measurements M and show that for some structured sparsity 
models M is independent of N. These results unify and gen- 
eralize the limited related work to date on structured sparsity 
models for strictly sparse signals [6-10]. We then introduce the 
notion of a structured compressible signal, whose coefficients 
a are no longer strictly sparse but have a structured power- 
law decay. To establish that structured compressible signals 
can be robustly recovered from compressive measurements, we 
generalize the standard RIP to a new restricted amplification 
property (RAmP). Using the RAmP, we show that the required 
number of measurements M for recovery of structured com- 

1 Obviously, sparsity and compressibility correspond to simple signal mod- 
els where each coefficient is treated independently; for example in a sparse 
model, the fact that the coefficient on is large has no bearing on the size of 
any a j> 3 i- We w iH reserve the use of the term "model" for situations 
where we are enforcing structured dependencies between the values and the 
locations of the coefficients on. 



pressible signals is independent of N. 

To take practical advantage of this new theory, we demon- 
strate how to integrate structured sparsity models into two 
state-of-the-art CS recovery algorithms, CoSaMP [11] and 
iterative hard thresholding (IHT) [12-16]. The key modifica- 
tion is surprisingly simple: we merely replace the nonlinear 
sparse approximation step in these greedy algorithms with a 
structured sparse approximation. Thanks to our new theory, 
both new model-based recovery algorithms have provable 
robustness guarantees for both structured sparse and structured 
compressible signals. 

To validate our theory and algorithms and demonstrate 
their general applicability and utility, we present two specific 
instances of model-based CS and conduct a range of simula- 
tion experiments. The first structured sparsity model accounts 
for the fact that the large wavelet coefficients of piecewise 
smooth signals and images tend to live on a rooted, connected 
tree structure [17]. Using the fact that the number of such 
trees is much smaller than (^), the number of if -sparse 
signal supports in TV dimensions, we prove that a tree-based 
CoSaMP algorithm needs only M — O (K) measurements 
to robustly recover tree-sparse and tree-compressible signals. 
This provides a significant reduction against the standard CS 
requirement M — 0(Klog(N/K)) as the signal length N 
increases. Figure 1 indicates the potential performance gains 
on a tree-compressible, piecewise smooth signal. 

The second structured sparsity model accounts for the 
fact that the large coefficients of many sparse signals clus- 
ter together [8,9]. Such a so-called block sparse model is 
equivalent to a joint sparsity model for an ensemble of J, 
length-iV signals [10], where the supports of the signals' 
large coefficients are shared across the ensemble. Using the 
fact that the number of clustered supports is much smaller 
than (j^), we prove that a block-based CoSaMP algorithm 
needs only M = O (JK + K log(^)) measurements to 
robustly recover block-sparse and block-compressible signals. 
In contrast, standard CS requires M = O ( J K \og{N / K)); 
block sparsity reduces the dependence of M on the signal 
length 7Y, particularly for large block sizes J. 

Our new theory and methods relate to a small body of 
previous work aimed at integrating structured sparsity into 
CS. Several groups have developed structured sparse signal 
recovery algorithms [6-8, 18-24]; however, their approaches 
have either been ad hoc or focused on a single structured 
sparsity model. Most previous work on unions of subspaces 
[6, 7, 24] has focused exclusively on strictly sparse signals and 
has considered neither compressibility nor feasible recovery 
algorithms. A related CS modeling framework for structured 
sparse and compressible signals [9] collects the N samples 
of a signal into D groups, D < N, and allows signals 
where K out of D groups have nonzero coefficients. This 
framework is immediately applicable to block-sparse signals 
and signal ensembles with common sparse supports. While 
[9] provides recovery algorithms, measurement bounds, and 
recovery guarantees similar to those provided in Section VI, 
our proposed framework has the ability to focus on arbitrary 
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(a) test signal (b) CoSaMP (c) ^-norm min. (d) model-based recovery 

(RMSE = 1.123) (PvMSE = 0.751) (RMSE = 0.037) 

Fig. 1, Example performance of structured signal recovery, (a) Piecewise smooth HeaviSine test signal of length N = 1024. This signal is compressible under 
a connected wavelet tree model. Signal recovered from M = 80 random Gaussian measurements using (b) the iterative recovery algorithm CoSaMP, (c) standard 
t\ -norm minimization via linear programming, and (d) the wavelet tree-based CoSaMP algorithm from Section V. In all figures, root mean-squared error (RMSE) 
values are normalized with respect to the £2 norm of the signal. 



subsets of the (^) groups that yield more elaborate structures, 
such as connected subtrees for wavelet coefficients. To the 
best of our knowledge, our general algorithmic framework for 
model-based recovery, the concept of a model-compressible 
signal, and the associated RAmP are new to the literature. 

This paper is organized as follows. A review of the CS 
theory in Section II lays out the foundational concepts that 
we extend to the model-based case in subsequent sections. 
Section III develops the concept of structured sparse signals 
and introduces the concept of structured compressible signals. 
We also quantify how structured sparsity models improve the 
measurement and recovery process by exploiting the model- 
based RIP for structured sparse signals and by introducing the 
RAmP for structured compressible signals. Section IV indi- 
cates how to tune CoSaMP to incorporate structured sparsity 
models and establishes its robustness properties for structured 
sparse and structured compressible signals; the modifications 
to the IHT algorithm are very similar, so we defer them to 
an appendix to reduce redundancy. Sections V and VI then 
specialize our theory to the special cases of wavelet tree and 
block sparse signal models, respectively, and report on a series 
of numerical experiments that validate our theoretical claims. 
We conclude with a discussion in Section VII. To make the 
paper more readable, all proofs are relegated to a series of 
appendices. 

II. Background on Compressive Sensing 

A. Sparse and compressible signals 

Given a basis {ipi}^L lt we can represent every signal x € 
M N in terms of N coefficients {ai}fL 1 as x = Q^i! 
stacking the ipi as columns into the N x N matrix 'J', we 
can write succinctly that x = ^a. In the sequel, we will 
assume without loss of generality that the signal x is sparse 
or compressible in the canonical domain so that the sparsity 
basis ^ is the identity and a — x. 

A signal x is K-sparse if only K <C N entries of x 
are nonzero. We call the set of indices corresponding to the 
nonzero entries the support of x and denote it by supp(ir). 
The set of all if-sparse signals is the union of the („), K- 
dimensional subspaces aligned with the coordinate axes in Mr. 
We denote this union of subspaces by T,k- 



Many natural and manmade signals are not strictly sparse, 
but can be approximated as such; we call such signals com- 
pressible. Consider a signal x whose coefficients, when sorted 
in order of decreasing magnitude, decay according to the 
power law 

|zz W | < Gi- X <\ i=l,...,N, (1) 

where X indexes the sorted coefficients. Thanks to the rapid 
decay of their coefficients, such signals are well-approximated 
by if-sparse signals. Let Xk € represent the best K- 
term approximation of x, which is obtained by keeping just 
the first K terms in Xxu) from (1). Denote the error of this 
approximation in the £ p norm as 

o-k{x) p := min ||x - x\\ p = \\x - x K \\ p , (2) 

where the £ p norm of the vector x is defined as \\x\\ p = 

(y^L 1 \xi\ p ^j for < p < 00. Then, for r < p, we have 
that 

a K {x) v <{rs)- 1,p GK- s , (3) 

with s = - — -. That is, when measured in the £ v norm, 
the signal's best approximation error has a power-law decay 
with exponent s as K increases. In the sequel we let p = 2, 
yielding s = 1/r — 1/2, and we dub a signal that obeys (3) 
an s-compressible signal. 

The approximation of compressible signals by sparse 
signals is the basis of transform coding as is used in algorithms 
like JPEG and JPEG2000 [1]. In this framework, we acquire 
the full TV-sample signal x; compute the complete set of 
transform coefficients a via a = fy~ 1 x; locate the K largest 
coefficients and discard the (N — K) smallest coefficients; and 
encode the K values and locations of the largest coefficients. 
While a widely accepted standard, this sample-then-compress 
framework suffers from three inherent inefficiencies. First, we 
must start with a potentially large number of samples N even 
if the ultimate desired K is small. Second, the encoder must 
compute all of the N transform coefficients a, even though it 
will discard all but K of them. Third, the encoder faces the 
overhead of encoding the locations of the large coefficients. 
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B. Compressive measurements and the restricted isometry 
property 

Compressive sensing (CS) integrates the signal acquisi- 
tion and compression steps into a single process [2-A]. In 
CS we do not acquire x directly but rather acquire M < N 
linear measurements y = $x using an M x N measurement 
matrix $. We then recover x by exploiting its sparsity or 
compressibility. Our goal is to push M as close as possible to 
if in order to perform as much signal "compression" during 
acquisition as possible. 

In order to recover a good estimate of x (the if largest 
Xi's, for example) from the M compressive measurements, the 
measurement matrix $ should satisfy the restricted isometry 
property (RIP) [3]. 

Definition 1: An M x N matrix $ has the K -restricted 
isometry property (if -RIP) with constant 8k if. f° r all X G 

(i-^)Nli<||Mli<(i + ^)IMI|. (4) 

In words, the if -RIP ensures that all submatrices of $ of 
size M x if are close to an isometry, and therefore distance 
(and information) preserving. Practical recovery algorithms 
typically require that $ have a slightly stronger 2if -RIP, 3if- 
RIP, or higher-order RIP in order to preserve distances between 
if -sparse vectors (which are 2if-sparse in general), three-way 
sums of if -sparse vectors (which are 3if -sparse in general), 
and other higher-order structures. 

While checking whether a measurement matrix $ satisfies 
the if -RIP is an NP-Complete problem in general [26], 
random matrices whose entries are independent and identi- 
cally distributed (i.i.d.) Gaussian, Rademacher (±1), or more 
generally subgaussian 2 work with high probability provided 
M = O (if log(N/K )). These random matrices also have 
a so-called universality property in that, for any choice of 
orthonormal basis matrix $\I r has the if -RIP with high 
probability. This is useful when the signal is sparse not in the 
canonical domain but in basis A random $ corresponds 
to an intriguing data acquisition protocol in which each 
measurement yj is a randomly weighted linear combination 
of the entries of x. 



C. Recovery algorithms 

Since there are infinitely many signal coefficient vectors 
x' that produce the same set of compressive measurements 
y = <l>x, to recover the "right" signal we exploit our a priori 
knowledge of its sparsity or compressibility. For example, we 
could seek the sparsest x that agrees with the measurements 

y- 

x = argmin ||x'||o s.t. y = <&x', (5) 

x' 

2 A random variable X is called subgaussian if there exists c > such 
that E(e xt ) < e c2 ' 2 / 2 for all t 6 R. Examples include the Gaussian, 
Bernoulli, and Rademacher random variables, as well as any bounded random 
variable. [25] 



where the Iq "norm" of a vector counts its number of nonzero 
entries. While this optimization can recover a if -sparse signal 
from just M — 2K compressive measurements, it is unfor- 
tunately a combinatorial, NP-hard problem [26]; furthermore, 
the recovery is not stable in the presence of noise [4]. 

Practical, stable recovery algorithms rely on the RIP 
(and therefore require at least M = 0(K\og(N/K)) mea- 
surements); they can be grouped into two camps. The first 
approach convexities the l§ "norm" minimization (5) to the 
^i-norm minimization 



argmin ||x'||i s.t. y = <f>x'. 



(6) 



This corresponds to a linear program that can be solved in 
polynomial time [2, 3]. Adaptations to deal with additive noise 
in y or x include basis pursuit with denoising (BPDN) [27], 
complexity-based regularization [28], and the Dantzig Selec- 
tor [29]. 

The second approach finds the sparsest x agreeing with 
the measurements y through an iterative, greedy search. Algo- 
rithms such as matching pursuit, orthogonal matching pursuit 
[30], StOMP [31], iterative hard thresholding (IHT) [12-16], 
CoSaMP [11], and Subspace Pursuit (SP) [32] all revolve 
around a best L-term approximation for the estimated signal, 
with L varying for each algorithm; typically L is O (if). 



D. Performance bounds on signal recovery 

Given M = 0(K\og(N/K)) compressive measure- 
ments, a number of different CS signal recovery algorithms, in- 
cluding all of the ^i-norm minimization techniques mentioned 
above and the CoSaMP, SP, and IHT iterative techniques, 
offer provably stable signal recovery with performance close 
to optimal if -term approximation (recall (3)) [2, 3, 11, 16]. For 
a random $, all results hold with high probability. 

For a noise-free, if-sparse signal, these algorithms offer 
perfect recovery, meaning that the signal x recovered from the 
compressive measurements y = $x is exactly x = x. 

For a if-sparse signal x whose measurements are cor- 
rupted by noise n of bounded norm (that is, we measure 
y = <£>x + n) the mean-squared error of the signal x is 



||x-x|| 2 <C||n|| 2 , 
with C a small constant. 



(7) 



For an s-compressible signal x whose measurements are 
corrupted by noise n of bounded norm, the mean-squared error 
of the recovered signal x is 

||x-x|| 2 < C4x-XKh + C2-^=\\x-x K \\i+Cz\\n\\2. (8) 

V if 



Using (3) we can simplify this expression to 

dGK- s C 2 GK- S 



x — x 9 < 



+ 



+ C3IMI2. (9) 



/2s ' a -1/2 

For the recovery algorithm (6), we obtain a bound very similar 
to (8), albeit with the ^ 2 -norm error component removed [33]. 
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III. Structured Sparsity and Compressibility 

While many natural and manmade signals and images 
can be described to first-order as sparse or compressible, the 
support of their large coefficients often has an underlying inter- 
dependency structure. This phenomenon has received only 
limited attention by the CS community to date [6-9, 19-23]. 
In this section, we introduce a model-based theory of CS 
that captures such structure. A model reduces the degrees of 
freedom of a sparse/compressible signal by permitting only 
certain configurations of supports for the large coefficient. 
As we will show, this allows us to reduce, in some cases 
significantly, the number of compressive measurements M 
required to stably recover a signal. 

A. Structured sparse signals 

Recall from Section II-A that a if -sparse signal vector 
x lives in T, K C R w , which is a union of (^) subspaces 
of dimension if. Other than its if -sparsity, there are no 
further constraints on the support or values of its coefficients. 
A structured sparsity model endows the if -sparse signal x 
with additional structure that allows certain if -dimensional 
subspaces in Y^k and disallows others [6,7]. 

To state a formal definition of a structured sparsity model, 
let a; |n represent the entries of x corresponding to the set of 
indices O C {1, . . . , N}, and let f2 c denote the complement 
of the set il. 

Definition 2: A structured sparsity model Mk is defined 
as the union of vhk canonical if -dimensional subspaces 

m K 

M K = (J X m , s.t. X rn := {x : x\ Um G R K ,x\ n c = 0}, 

m— 1 

where {fii, . . . , fl mK } is the set containing all allowed sup- 
ports, with |fi m | = if for each m = 1, . . . , rriK, and each 
subspace X m contains all signals x with supp(x) C fl m . 

Signals from Mk are called K-structured sparse. Clearly, 
Mk C T,k and contains mx < (^) subspaces. 

In Sections V and VI below we consider two concrete 
structured sparsity models. The first model accounts for the 
fact that the large wavelet coefficients of piecewise smooth 
signals and images tend to live on a rooted, connected tree 
structure [17]. The second model accounts for the fact that 
the large coefficients of sparse signals often cluster together 
into blocks [8-10]. 

B. Model-based RIP 

If we know that the signal x being acquired is K- 
structured sparse, then we can relax the RIP constraint on the 
CS measurement matrix $ and still achieve stable recovery 
from the compressive measurements y = <frx [6,7]. 

Definition 3: [6,7] An M x N matrix $ has the Mr- 
restricted isometry property (Mk-R1P) with constant Sm k if. 



for all i e Mjf, we have 

(1 - S Mk )\\x\\ 2 2 < \\$xf 2 < (1 + S Mk )\\x\\1 (10) 

Blumensath and Davies [6] have quantified the number of 
measurements M necessary for a random CS matrix to have 
the Al/f-RIP with a given probability. 

Theorem 1: [6] Let Mk be the union of tuk subspaces 
of if -dimensions in R N . Then, for any t > and any 

M > — |— (\n(2m K )+K\n-^-+t) , 

<*Mk V 5 Mk J 

where c is a positive constant, an M x N i.i.d. subgaussian 
random matrix has the Mk-^P with constant 5m k with 
probability at least 1 — e _t . 

This bound can be used to recover the conventional CS 
result by substituting rriK = (^) ~ (Ne/K) K . Similarly, 
as the number of subspaces mx that arise from the structure 
imposed can be significantly smaller than the standard (^), the 
number of rows needed for a random matrix to have the M.k~ 
RIP can be significantly lower than the number of rows needed 
for the standard RIP. The A-l/f-RIP property is sufficient for 
robust recovery of structured sparse signals, as we show below 
in Section IV-B. 

C. Structured compressible signals 

Just as compressible signals are "nearly if -sparse" and 
thus live close to the union of subspaces Sjc in R N , structured 
compressible signals are "nearly if -structured sparse" and live 
close to the restricted union of subspaces Mk- In this section, 
we make this new concept rigorous. Recall from (3) that we 
defined compressible signals in terms of the decay of their 
if -term approximation error. 

The £2 error incurred by approximating x G l w by the 
best structured sparse approximation in Mk is given by 

<tm k (x) := inf \\x - x\\ 2 . 

x£M K 

We define M B (x,K) as the algorithm that obtains the best 
K-term structured sparse approximation of x in the union of 
subspaces Mk'- 

M(x, if ) = arg _min ||a; — 55 1 1 2 . 

xEMk 

This implies that \\x — M.(x,K)\\z = om k ( x )- The decay of 
this approximation error defines the structured compressibility 
of a signal. 

Definition 4: The set of s-structured compressible sig- 
nals is defined as 

m s = {x G R N : <t Mk (x) < GR- S , 1 < if < N, G < 00} . 

Define |x|gji s as the smallest value of G for which this 
condition holds for x and s. 

We say that x G 2ft s is an s-structured compressible 
signal under the structured sparsity model Mk- These approx- 
imation classes have been characterized for certain structured 
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sparsity models; see Section V for an example. We will 
select the value of s for which the distance between the 
approximation errors cfm k { x ) and the corresponding bounds 
GK~ X I S is minimal. 



D. Nested model approximations and residual subspaces 

In conventional CS, the same requirement (RIP) is a 
sufficient condition for the stable recovery of both sparse 
and compressible signals. In model-based recovery, however, 
the class of structured compressible signals is much larger 
than that of structured sparse signals, since the union of 
subspaces defined by structured sparse signals does not contain 
all canonical if-dimensional subspaces. 

To address this difference, we introduce some additional 
tools to develop a sufficient condition for the stable recovery 
of structured compressible signals. We will pay particular at- 
tention to structured sparsity models A4k that generate nested 
approximations, since they are more amenable to analysis and 
computation. 

Definition 5: A structured sparsity model A4 = 
{Mi, M.2, ■ ■ ■} has the nested approximation property (NAP) 
if supp(M(x, K)) C supp(M(a;, K')) for all K < K' and for 
all x € R N . 

In words, a structured sparsity model generates nested 
approximations if the support of the best K'-teim structured 
sparse approximation contains the support of the best if-term 
structured sparse approximation for all K < K'. An important 
example of a NAP-generating structured sparse model is the 
standard compressible signal model of (3). 

When a structured sparsity model obeys the NAP, the 
support of the difference between the best j K -term structured 
sparse approximation and the best (j + l)if-term structured 
sparse approximation of a signal can be shown to lie in a 
small union of subspaces, thanks to the structure enforced by 
the model. This structure is captured by the set of subspaces 
that are included in each subsequent approximation, as defined 
below. 

Definition 6: The j th set of residual subspaces of size 
K is defined as TZ^ K (M) = {u e R N : u = M(x,jK) - 
M(x, (j - 1)10 for some x e U N }, for j = 1, . . . , \N/K]. 

Under the NAP, each structured compressible signal x 
can be partitioned into its best IT-term structured sparse 
approximation xt x , the additional components present in the 
best 2!T-term structured sparse approximation xt 2 , and so on, 
with x — Yl\=\ K1[ x Tj and x^ € TZj^i-M) for each j. Each 
signal partition xt, is a i^-sparse signal, and thus TZj t K(-M) 
is a union of subspaces of dimension K. We will denote by 
Rj the number of subspaces that compose IZj.xiM) and omit 
the dependence on A4 in the sequel for brevity. 

Intuitively, the norms of the partitions ||£tJ|2 decay as 
j increases for signals that are structured compressible. As 
the next subsection shows, this observation is instrumental in 
relaxing the isometry restrictions on the measurement matrix $ 



and bounding the recovery error for s-structured compressible 
signals when the model obeys the NAP. 



E. The restricted amplification property (RAmP) 

For exactly K- structured sparse signals, we discussed in 
Section III-B that the number of compressive measurements 
M required for a random matrix to have the .M if -RIP is de- 
termined by the number of canonical subspaces ttik via (11). 
Unfortunately, such structured sparse concepts and results do 
not immediately extend to structured compressible signals. 
Thus, we develop a generalization of the .M/f-RiP that we 
will use to quantify the stability of recovery for structured 
compressible signals. 

One way to analyze the robustness of compressible signal 
recovery in conventional CS is to consider the tail of the signal 
outside its IT-term approximation as contributing additional 
"noise" to the measurements of size \\&(x — x K )\\ 2 [11, 
16,33]. Consequently, the conventional X-sparse recovery 
performance result can be applied with the augmented noise 
n + — Xk )■ 

This technique can also be used to quantify the robustness 
of structured compressible signal recovery. The key quantity 
we must control is the amplification of the structured sparse 
approximation residual through $. The following property is 
a new generalization of the RIP and model-based RIP. 

Definition 7: A matrix $ has the (ck ,r)-restricted am- 
plification property (RAmP) for the residual subspaces 1Zj t x 
of model M if 



||<Mll<(i + ^)j 2r Hl 

for any u e TZj.x for each 1 < j < \N/K~\ . 



(11) 



The regularity parameter r > caps the growth rate of 
the amplification of u e TZj.x as a function of j. Its value can 
be chosen so that the growth in amplification with j balances 
the decay of the norm in each residual subspace IZj.K with j. 

We can quantify the number of compressive measure- 
ments M required for a random measurement matrix $ to 
have the RAmP with high probability; we prove the following 
in Appendix A. 

Theorem 2: Let $ be an M x W matrix with i.i.d. 
subgaussian entries and let the set of residual subspaces 1^j,K 
of the structured sparsity model M. contain Rj subspaces of 
dimension K for each 1 < j < \N / K~\ . If 



M > 



2K + 41n 



max 



R 3 N 
K 



2t 



l<j<\N/K] (jr^/TTT^-l)' 



(12) 



then the matrix $ has the (ex , r)-RAmP with probability 1 — 

e"*. 

The order of the bound of Theorem 2 is lower than 
O (K\og{N/K)) as long as the number of subspaces Rj 
grows slower than N K . 
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Armed with the RaMP, we can state the following result, 
which will provide robustness for the recovery of structured 
compressible signals; see Appendix B for the proof. 

Theorem 3: Let x <G 5Dt s be an s-structured compressible 
signal under a structured sparsity model M. that obeys the 
NAP. If $ has the (e^, r)-RAmP and r = s— 1, then we have 

\<$>(x-M(x,K))\\ 2 < C S VTT7^K- S \n 



K 



\x\m s , 



where C s is a constant that depends only on s. 



IV. Model-Based Signal Recovery Algorithms 

To take practical advantage of our new theory for model- 
based CS, we demonstrate how to integrate structured spar- 
sity models into two state-of-the-art CS recovery algorithms, 
CoSaMP [11] (in this section) and iterative hard thresholding 
(IHT) [12-16] (in Appendix C to avoid repetition). The key 
modification is simple: we merely replace the best if-term 
sparse approximation step in these greedy algorithms with a 
best if -term structured sparse approximation. Since at each 
iteration we need only search over the mx subspaces of Mk 
rather than (^) subspaces of S^, fewer measurements will 
be required for the same degree of robust signal recovery. Or, 
alternatively, using the same number of measurements, more 
accurate recovery can be achieved. 

After presenting the modified CoSaMP algorithm, we 
prove robustness guarantees for both structured sparse and 
structured compressible signals. To this end, we must define 
an enlarged union of subspaces that includes sums of elements 
in the structured sparsity model. 

Definition 8: The B-order sum for the set A4k, with 
B > 1 an integer, is defined as 



M B K 



x = Y J x {r \ withal £M K 



r=l 



Define Ms(x, K) as the algorithm that obtains the best 
approximation of x in the enlarged union of subspaces -Mj^: 



arg mm || x ■ 



X 2- 



We note that M(x, K) = Mi(x, K). Note also that for many 
structured sparsity models, we will have A4^ C Mbk, 
and so the algorithm M(x, BK) will provide a strictly better 
approximation than Mb(x, K). 



the IHT and SP algorithms amenable to modification; see 
Appendix C for details on IHT. Pseudocode for the modified 
CoSaMP algorithm is given in Algorithm 1, where denotes 
the Moore-Penrose pseudoinverse of A. 



B. Performance of structured sparse signal recovery 

We now study the performance of model-based CoSaMP 
signal recovery on structured sparse and structured compress- 
ible signals. A robustness guarantee for noisy measurements 
of structured sparse signals can be obtained using the model- 
based RIP (10). Our performance guarantee for structured 
sparse signal recovery will require that the measurement 
matrix $ be a near-isometry for all subspaces in for some 
B > 1. This requirement is a direct generalization of the 2K- 
RIP, 3if -RIP, and higher-order RIPs from the conventional CS 
theory. The following theorem is proven in Appendix D. 

Theorem 4: Let x E A4k and let y = Qx + n be a set 
of noisy CS measurements. If $ has an Al^-RIP constant of 
<>M 4 K < 0.1, then the signal estimate Xi obtained from iteration 
i of the model-based CoSaMP algorithm satisfies 



Ik — a?i||2 < 2 4 ||x|| 2 + 151 1 1 2 - 



(13) 



This guarantee matches that of the CoSaMP algorithm [11, 
Theorem 4.1]; however, our guarantee is only for structured 
sparse signals rather than for all sparse signals. 



C. Performance of structured compressible signal recovery 

Using the new tools introduced in Section III, we can 
provide a robustness guarantee for noisy measurements of 
structured compressible signals, using the RAmP as a con- 
dition on the measurement matrix $. 

Theorem 5: Let x e 9Jt s be an s-structured compressible 
signal from a structured sparsity model M. that obeys the NAP, 
and let y = $.x + n be a set of noisy CS measurements. If $ 
has the Al^-RIP with 5 M ^ < 0.1 and the (e K ,r)-RAmP 
with 6k < 0.1 and r = s — 1, then the signal estimate 
Xi obtained from iteration i of the model-based CoSaMP 
algorithm satisfies 



\x-%h < 2-1z|| 2 + 35||n||2 

+ZhC s \x\ m K- s {l + \n\N/K^). (14) 



A. Model-based CoSaMP 

We choose to modify the CoSaMP algorithm [11] for 
two reasons. First, it has robust recovery guarantees that are 
on par with the best convex optimization-based approaches. 
Second, it has a simple iterative, greedy structure based on 
a best B K -term approximation (with B a small integer) that 
is easily modified to incorporate a best if -term structured 
sparse approximation Mb(K,x). These properties also make 



Proof sketch. To prove the theorem, we first bound the optimal 
structured sparse recovery error for an s-structured compress- 
ible signal x e Wl s when the matrix $ has the (e^-,r)-RAmP 
with r < s — 1 (see Theorem 3). Then, using Theorem 4, we 
can easily prove the result by following the analogous proof 
in [11]. □ 

The standard CoSaMP algorithm also features a similar 
guarantee for structured compressible signals, with the con- 
stant changing from 35 to 20. 



s 
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Algorithm 1 Model-based CoSaMP 



Inputs: CS matrix measurements y, structured sparse approximation algorithm 
Output: if -sparse approximation x to true signal x 



x = , d = y; i = 

while halting criterion false do 

1. i <- i + 1 

2. e 4- $ T d 

3. n <- supp(M 2 (e, if)) 

4. T<- QUsupp(z;_i) 

5. 6| T <— 6| T c 

6. X, 4- M(6, X) 

7. d <- y - $^ 
end while 

return J <— a;,- 



{initialize} 



{form signal residual estimate} 

{prune residual estimate according to structure} 

{merge supports} 

{form signal estimate} 

{prune signal estimate according to structure} 
{update measurement residual} 



D. Robustness to model mismatch 

We now analyze the robustness of model-based CS recov- 
ery to model mismatch, which occurs when the signal being 
recovered from compressive measurements does not conform 
exactly to the structured sparsity model used in the recovery 
algorithm. 

We begin with optimistic results for signals that are 
"close" to matching the recovery structured sparsity model. 
First consider a signal x that is not if -structured sparse as 
the recovery algorithm assumes but rather (K + /^-structured 
sparse for some small integer k. This signal can be de- 
composed into xk, the signal's if-term structured sparse 
approximation, and x — Xk, the error of this approximation. 
For k < K , we have that x — Xk € IZ2, k- If the matrix $ 
has the (ck, r)-RAmP, then it follows than 

\\$(x - x K )\\ 2 < 2 r VT+7^\\x-x K \\ 2 . (15) 

Using equations (13) and (15), we obtain the following guar- 
antee for the i th iteration of model-based CoSaMP: 

\\x — Xi\\ 2 < 2~ 4 ||x|| 2 + 16 • 2*Vl + £k\\x — xk\\2 + 15||n||2- 

By noting that \\x — Xk\\2 is small, we obtain a guarantee that 
is close to (13). 

Second, consider a signal x that is not s-structured 
compressible as the recovery algorithm assumes but rather 
(s — e)-structured compressible. The following bound can be 
obtained under the conditions of Theorem 5 by modifying the 
argument in Appendix B: 

\\x-Xih < 2-iM|2 + 35^||n|| 2 

+c s \x\ ms K- s (i + . 

As e becomes smaller, the factor ^7-^1 - 1 a pp roac hes 
log [ N/K~\ , matching (14). In summary, as long as the devi- 
ations from the structured sparse and structured compressible 
classes are small, our model-based recovery guarantees still 
apply within a small bounded constant factor. 



We end with an intuitive worst-case result for signals that 
are arbitrarily far away from structured sparse or structured 
compressible. Consider such an arbitrary x G R N and compute 
its nested structured sparse approximations XjK = M(x,jK), 
j = 1, . . . , \N/K~\. If x is not structured compressible, 
then the structured sparse approximation error <jjK{x) is not 
guaranteed to decay as j decreases. Additionally, the number 
of residual subspaces IZj.K could be as large as (^); that 
is, the j th difference between subsequent structured sparse 
approximations XTj = xjk — X(j-i)K might lie in any arbi- 
trary if-dimensional subspace. This worst case is equivalent 
to setting r = and Rj = (^) in Theorem 2. It is easy to see 
that the resulting condition on the number of measurements M 
matches that of the standard RIP for CS. Hence, if we inflate 
the number of measurements to M — O (K\og(N/K)) (the 
usual number for conventional CS), the performance of model- 
based CoSaMP recovery on an arbitrary signal x follows the 
distortion of the best if-term structured sparse approximation 
error of x within a bounded constant factor. 



E. Computational complexity of model-based recovery 

The computational complexity of a structured signal 
recovery algorithm differs from that of a standard algorithm by 
two factors. The first factor is the reduction in the number of 
measurements M necessary for recovery: since most current 
recovery algorithms have a computational complexity that 
is linear in the number of measurements, any reduction in 
M reduces the total complexity. The second factor is the 
cost of the structured sparse approximation. The if-term 
approximation used in most current recovery algorithms can 
be implemented with a simple sorting operation (O (N log N) 
complexity, in general). Ideally, the structured sparsity model 
should support a similarly efficient approximation algorithm. 

To validate our theory and algorithms and demonstrate 
their general applicability and utility, we now present two 
specific instances of model-based CS and conduct a range of 
simulation experiments. 
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V. Example: Wavelet Tree Model 

Wavelet decompositions have found wide application in 
the analysis, processing, and compression of smooth and 
piecewise smooth signals because these signals are X-sparse 
and compressible, respectively [1]. Moreover, the wavelet 
coefficients can be naturally organized into a tree structure, 
and for many kinds of natural and manmade signals the 
largest coefficients cluster along the branches of this tree. This 
motivates a connected tree model for the wavelet coefficients 
[34-36]. 

While CS recovery for wavelet-sparse signals has been 
considered previously [19-23], the resulting algorithms inte- 
grated the tree constraint in an ad-hoc fashion. Furthermore, 
the algorithms provide no recovery guarantees or bounds on 
the necessary number of compressive measurements. 



A. Tree-sparse signals 

We first describe tree sparsity in the context of sparse 
wavelet decompositions. We focus on one-dimensional signals 
and binary wavelet trees, but all of our results extend directly 
to d-dimensional signals and 2 rf -ary wavelet trees. 

Consider a signal x of length N — 2 1 , for an integer 
value of /. The wavelet representation of x is given by 

1-1 2*-l 
X = VqV + ^ X! W h^h^ 

i=0 j=o 

where v is the scaling function and is the wavelet function 
at scale i and offset j. The wavelet transform consists of the 
scaling coefficient vo and wavelet coefficients Wi j at scale i, 
< i < I — 1, and position j, < j < 2 l — 1. In terms 
of our earlier matrix notation, x has the representation x = 
^>a, where VP is a matrix containing the scaling and wavelet 
functions as columns, and a = [i>o u>o.o wi.o u>i,i W2,o ■ ■ ] T 
is the vector of scaling and wavelet coefficients. We are, of 
course, interested in sparse and compressible a. 

The nested supports of the wavelets at different scales 
create a parent/child relationship between wavelet coefficients 
at different scales. We say that if j_i u/21 is the parent of Wij 
and that u^+i^j and ^,+1.2^+1 are the children of w^j. These 
relationships can be expressed graphically by the wavelet 
coefficient tree in Figure 2. 

Wavelet functions act as local discontinuity detectors, 
and using the nested support property of wavelets at different 
scales, it is straightforward to see that a signal discontinuity 
will give rise to a chain of large wavelet coefficients along a 
branch of the wavelet tree from a leaf to the root. Moreover, 
smooth signal regions will give rise to regions of small wavelet 
coefficients. This "connected tree" property has been well- 
exploited in a number of wavelet-based processing [17,37, 
38] and compression [39,40] algorithms. In this section, we 
will specialize the theory developed in Sections III and IV to 
a connected tree model T. 




Fig. 2. Binary wavelet tree for a one-dimensional signal. The squares denote 
the large wavelet coefficients that arise from the discontinuities in the piecewise 
smooth signal drawn below; the support of the large coefficients forms a rooted, 
connected tree. 

A set of wavelet coefficients forms a connected sub- 
tree if, whenever a coefficient Wi.j E £1, then its parent 
w i-i,\j/2~] E ft as well. Each such set ft defines a subspace 
of signals whose support is contained in ft; that is, all 
wavelet coefficients outside fl are zero. In this way, we define 
the structured sparsity model Tk as the union of all K- 
dimensional subspaces corresponding to supports 57 that form 
connected subtrees. 

Definition 9: Define the set of K-tree sparse signals as 

X = V V + ^ ^2 W i,3^i,3 '■ Hoc = 0, 
i=0 j=l 

|fi| = K, fi forms a connected subtree j> . 

To quantify the number of subspaces in Tk, it suffices 
to count the number of distinct connected subtrees of size K 
in a binary tree of size N. We prove the following result in 
Appendix E. 

Proposition 1: The number of subspaces in Tk obeys 

T K < ^£ for K > log 2 N and T K < ^TT for K < 
log 2 N. 

To simplify the presentation in the sequel, we will simply use 
the weaker bound Tk < K +i ^ or a ^ va l ues °f K an d N. 

B. Tree-based approximation 

To implement tree-based signal recovery, we seek an 
efficient algorithm T(x, K) to solve the optimal approximation 

x K = arg min — £|| 2 . (16) 

xeT K 

Fortuitously, an efficient solver exists, called the condensing 
sort and select algorithm (CSS A) [34-36]. Recall that subtree 
approximation coincides with standard A'-term approximation 
(and hence can be solved by simply sorting the wavelet 
coefficients) when the wavelet coefficients are monotonically 
nonincreasing along the tree branches out from the root. The 
CSSA solves (16) in the case of general wavelet coefficient 



10 



BARANIUK et ah: MODEL-BASED COMPRESSIVE SENSING 



values by condensing the nonmonotonic segments of the tree 
branches using an iterative sort-and-average routine during a 
greedy search through the nodes. For each node in the tree, the 
algorithm calculates the average wavelet coefficient magnitude 
for each subtree rooted at that node, and records the largest 
average among all the subtrees as the energy for that node. 
The CSSA then searches for the unselected node with the 
largest energy and adds the subtree corresponding to the node's 
energy to the estimated support as a supernode: a single node 
that provides a condensed representation of the corresponding 
subtree [36]. Condensing a large coefficient far down the tree 
accounts for the potentially large cost (in terms of the total 
budget of tree nodes K) of growing the tree to that point. 

Since the first step of the CSSA involves sorting all 
of the wavelet coefficients, overall it requires O (N log N) 
computations. However, once the CSSA grows the optimal 
tree of size K, it is trivial to determine the optimal trees of 
size < K and computationally efficient to grow the optimal 
trees of size > K [34]. 

The constrained optimization (16) can be rewritten as an 
unconstrained problem by introducing the Lagrange multiplier 
A [41]: 



mm || x ■ 



X(\\a\\ -K), 



where T = U„ =1 7^ and a are the wavelet coefficients of 
x. Except for the inconsequential \K term, this optimization 
coincides with Donoho's complexity penalized sum of squares 
[41], which can be solved in only O (N) computations using 
coarse-to-fine dynamic programming on the tree. Its primary 
shortcoming is the nonobvious relationship between the tuning 
parameter A and and the resulting size K of the optimal 
connected subtree. 



C. Tree-compressible signals 

Specializing Definition 2 from Section III-C to T, we 
make the following definition. 

Definition 10: Define the set of s-tree compressible sig- 
nals as 

T s = {x e R N : \\x — T(x, K)\\ 2 < GK~ S , 
l<K<N,G<oo}. 

Furthermore, define |a;|<j; s as the smallest value of G for which 
this condition holds for x and s. 

Tree approximation classes contain signals whose wavelet 
coefficients have a loose (and possibly interrupted) decay 
from coarse to fine scales. These classes have been well- 
characterized for wavelet-sparse signals [35, 36, 40] and are in- 
trinsically linked with the Besov spaces B^(L p ([0, 1])). Besov 
spaces contain functions of one or more continuous variables 
that have (roughly speaking) s derivatives in L p ([0, 1]); the 
parameter q provides finer distinctions of smoothness. When 
a Besov space signal x a e B*(L p ([0, 1])) with s > 1/p — 1/2 
is sampled uniformly and converted to a length- N vector x, 



its wavelet coefficients belong to the tree approximation space 
1 S , with 

\xn\% s ~ ||^a|U p ([0,l]) + ||a:o||B«(L p ([0,l]))j 

where "x" denotes an equivalent norm. The same result holds 

if s = 1/p — 1/2 and q < p. 



D. Stable tree-based recovery from compressive measurements 

For tree-sparse signals, by applying Theorem 1 and 
Proposition 1, we find that a subgaussian random matrix has 
the Tff-RIP property with constant 8t k and probability 1 — e~* 
if the number of measurements obeys 



M > 



^,48 , 512 
Kin— +ln— - +t 

S Tk Ke 2 



Thus, the number of measurements necessary for stable re- 
covery of tree-sparse signals is linear in K, without the 
dependence on N present in conventional non-model-based 
CS recovery. 

For tree-compressible signals, we must quantify the num- 
ber of subspaces Rj in each residual set IZj.K for the approx- 
imation class. We can then apply the theory of Section IV-C 
with Proposition 1 to calculate the smallest allowable M via 
Theorem 5. 

Proposition 2: The number of if-dimensional subspaces 
that comprise 1Zj t K obeys 

( 2e )*W+i) 



Rj ~ (Kj + K + l)(Kj + iy 



(17) 



Using Proposition 2 and Theorem 5, we obtain the following 
condition for the matrix $ to have the RAmP, which is proved 
in Appendix F. 

Proposition 3: Let $ be an M x JV matrix with i.i.d. 
subgaussian entries. If 



M > 



2 (l0if + 21n 



N 



K(K+1)(2K+1) 



+ t 



then the matrix $ has the (e^,s)-RAmP for the structured 
sparsity model T and all s > 0.5 with probability 1 — e~*. 

Both cases give a simplified bound on the number of 
measurements required as M — O (K), which is a substantial 
improvement over the M — 0(K\og(N/K)) required by 
conventional CS recovery methods. Thus, when $ satisfies 
Proposition 3, we have the guarantee (14) for sampled Besov 
space signals from B s q (L p {{§, 1])). 



E. Experiments 

We now present the results of a number of numerical 
experiments that illustrate the effectiveness of a tree-based 
recovery algorithm. Our consistent observation is that explicit 
incorporation of the structured sparsity model in the recovery 
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process significantly improves the quality of recovery for 
a given number of measurements. In addition, model-based 
recovery remains stable when the inputs are no longer tree- 
sparse, but rather are tree-compressible and/or corrupted with 
differing levels of noise. We employ the Daubechies-6 wavelet 
basis for sparsity, and recover the signal using model-based 
CoSaMP (Algorithm 1) with a CSSA-based structured sparse 
approximation step in all experiments. 

We first study one-dimensional signals that match the 
connected wavelet-tree model described above. Among such 
signals is the class of piecewise smooth functions, which are 
commonly encountered in analysis and practice. 

Figure 1 illustrates the results of recovering the tree- 
compressible HeaviSine signal of length N — 1024 from 
M — 80 noise-free random Gaussian measurements using 
CoSaMP, £i-norm minimization using the ll_eq solver from 
the l\ -Magic toolbox, 3 and our tree-based recovery algorithm. 
It is clear that the number of measurements (M = 80) is 
far fewer than the minimum number required by CoSaMP 
and £i-norm minimization to accurately recover the signal. 
In contrast, tree-based recovery using K = 26 is accurate and 
uses fewer iterations to converge than conventional CoSaMP. 
Moreover, the normalized magnitude of the squared error for 
tree-based recovery is equal to 0.037, which is remarkably 
close to the error between the noise-free signal and its best 
if-term tree-approximation (0.036). 

Figure 3(a) illustrates the results of a Monte Carlo simu- 
lation study on the impact of the number of measurements M 
on the performance of model-based and conventional recovery 
for a class of tree-sparse piecewise polynomial signals. Each 
data point was obtained by measuring the normalized recovery 
error of 500 sample trials. Each sample trial was conducted 
by generating a new piecewise polynomial signal of length 
N = 1024 with five polynomial pieces of cubic degree 
and randomly placed discontinuities, computing its best K- 
term tree-approximation using the CSSA, and then measuring 
the resulting signal using a matrix with i.i.d. Gaussian en- 
tries. Model-based recovery attains near-perfect recovery at 
M = 3K measurements, while CoSaMP only matches this 
performance at M = 5K. 

For the same class of signals, we empirically compared 
the recovery times of our proposed algorithm with those of the 
standard approach (CoSaMP). Experiments were conducted 
on a Sun workstation with a 1.8GHz AMD Opteron dual- 
core processor and 2GB memory running UNIX, using non- 
optimized Matlab code and a function-handle based imple- 
mentation of the random projection operator As is evident 
from Figure 3(b), wavelet tree-based recovery is in general 
slower than CoSaMP. This is due to the fact that the CSSA 
step in the iterative procedure is more computationally de- 
manding than simple K— term approximation. Nevertheless, 
the highest benefits of model-based CS recovery are obtained 
around M = 3K; in this regime, the runtimes of the two 
approaches are comparable, with tree-based recovery requiring 

3 http://www. acm. caltech. edu/11 magic. 
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Fig. 4. Required overmeasuring factor M/ K to achieve a target recovery error 
| [ic — x\\2 < 2.5<tt k (a;) as a function of the signal length N for standard and 
model-based recovery of piecewise smooth signals. While standard recovery 
requires M to increase logarithmically with N, the required M is essentially 
constant for model-based recovery. 



fewer iterations and yielding much smaller recovery error than 
standard recovery. 

Figure 4 shows the growth of the overmeasuring factor 
M I K with the signal length N for conventional CS and 
model-based recovery. We generated 50 sample piecewise 
cubic signals and numerically computed the minimum number 
of measurements M required for the recovery error x|| 2 < 
2.5<tt k (%), the best tree-approximation error, for every sample 
signal. The figure shows that while doubling the signal length 
increases the number of measurements required by standard 
recovery by K, the number of measurements required by 
model-based recovery is constant for all N. These experi- 
mental results verify the theoretical performance described in 
Proposition 3. 

Further, we demonstrate that model-based recovery per- 
forms stably in the presence of measurement noise. We 
generated sample piecewise polynomial signals as above, 
computed their best if -term tree-approximations, computed 
M measurements of each approximation, and finally added 
Gaussian noise of variance o to each measurement, so that 
the expected variance B[||n||2] = oyM. We emphasize that 
this noise model implies that the energy of the noise added 
will be larger as M increases. Then, we recovered the signals 
using CoSaMP and model-based recovery and measured the 
recovery error in each case. For comparison purposes, we also 
tested the recovery performance of a £i-norm minimization 
algorithm that accounts for the presence of noise, which 
has been implemented as the ll_qc solver in the l\ -Magic 
toolbox. First, we determined the lowest value of M for which 
the respective algorithms provided near-perfect recovery in the 
absence of noise in the measurements. This corresponds to 
M = 3.5if for model-based recovery, M = hK for CoSaMP, 
and M = 4.5K for ^i-norm minimization. Next, we gener- 
ated 200 sample tree-structured signals, computed M noisy 
measurements, recovered the signal using the given algorithm 
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Fig. 3. Performance of CoSaMP vs. wavelet tree-based recovery on a class of tree-sparse signals, (a) Average normalized recovery error and (b) average runtime 
for each recovery algorithm as a function of the overmeasuring factor Mj K. The number of measurements M for which the wavelet tree-based algorithm obtains 
near-perfect recovery is much smaller than that required by CoSaMP. The penalty paid for this improvement is a modest increase in the runtime. 



~0.8 



CoSaMP (M = 5K) 
" I'l-minimization (M = 4.5K) 
-Model-based recovery (M = 3.5K) 




0.5 




(a) Peppers 



(b) CoSaMP 
(RMSE = 22.8 




(c) model-based rec. 
(RMSE = 11.1) 



Fig. 6. Example performance of standard and model-based recovery on 
images, (a) N = 128 X 128 = 16384-pixeJ Peppers test image. Image 
recovery from M = 5000 compressive measurements using (b) conventional 
CoSaMP and (c) our wavelet tree-based algorithm. 



Fig. 5. Robustness to measurement noise for standard and wavelet tree-based 
CS recovery algorithms. We plot the maximum normalized recovery error over 
200 sample trials as a function of the expected signal-to-noise ratio. The linear 
growth demonstrates that model-based recovery possesses the same robustness 
to noise as CoSaMP and t\ -norm minimization. 



and recorded the recovery error. Figure 5 illustrates the growth 
in maximum normalized recovery error (over the 200 sample 
trials) as a function of the expected measurement signal-to- 
noise ratio for the tree algorithms. We observe similar stability 
curves for all three algorithms, while noting that model-based 
recovery offers this kind of stability using significantly fewer 
measurements. 

Finally, we turn to two-dimensional images and a wavelet 
quadtree model. The connected wavelet-tree model has proven 
useful for compressing natural images [35]; thus, our algo- 
rithm provides a simple and provably efficient method for 
recovering a wide variety of natural images from compressive 
measurements. An example of recovery performance is given 
in Figure 6. The test image {Peppers) is of size N = 128 x 
128 = 16384 pixels, and we computed M = 5000 random 
Gaussian measurements. Model-based recovery again offers 
higher performance than standard signal recovery algorithms 
like CoSaMP, both in terms of recovery mean-squared error 
and visual quality. 



VI. Example: Block-Sparse Signals and Signal 
Ensembles 

In a block-sparse signal, the locations of the signifi- 
cant coefficients cluster in blocks under a specific sorting 
order. Block-sparse signals have been previously studied in 
CS applications, including DNA microarrays and magnetoen- 
cephalography [8,9]. An equivalent problem arises in CS 
for signal ensembles, such as sensor networks and MIMO 
communication [9,10,42]. In this case, several signals share 
a common coefficient support set. For example, when a 
frequency-sparse acoustic signal is recorded by an array of 
microphones, then all of the recorded signals contain the same 
Fourier frequencies but with different amplitudes and delays. 
Such a signal ensemble can be re-shaped as a single vector by 
concatenation, and then the coefficients can be rearranged so 
that the concatenated vector exhibits block sparsity. 

It has been shown that the block-sparse structure enables 
signal recovery from a reduced number of CS measurements, 
both for the single signal case [8,9,43] and the signal ensem- 
ble case [10], through the use of specially tailored recovery 
algorithms. However, the robustness guarantees for the algo- 
rithms [8,43] either are restricted to exactly sparse signals 
and noiseless measurements, do not have explicit bounds on 
the number of necessary measurements, or are asymptotic 
in nature. An optimization-based algorithm introduced in [9] 
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provides similar recovery guarantees to those obtained by the 
algorithm we present in this chapter; thus, our method can 
be interpreted as a greedy-based counterpart to that provided 
in [9]. 

In this section, we formulate the block sparsity model as 
a union of subspaces and pose an approximation algorithm 
on this union of subspaces. The approximation algorithm is 
used to implement block-based signal recovery. We also define 
the corresponding class of block-compressible signals and 
quantify the number of measurements necessary for robust 
recovery. 



A. Block-sparse signals 

Consider a class S of signal vectors x € M. JN , with J 
and N integers. This signal can be reshaped into a J x N 
matrix X, and we use both notations interchangeably in the 
sequel. We will restrict entire columns of X to be part of the 
support of the signal as a group. That is, signals X in a block- 
sparse model have entire columns as zeros or nonzeros. The 
measure of sparsity for X is its number of nonzero columns. 
More formally, we make the following definition. 

Definition 11: [8,9] Define the set of K-block sparse 
signals as 

S K = {X =[x! ... x N ]e R JxN such that 

x n = for n $ O, O C {1, . . . , N}, = K}. 

It is important to note that a if -block sparse signal has 
sparsity KJ, which is dependent on the size of the block J. 
We can extend this formulation to ensembles of J, length- 
N signals with common support. Denote this signal ensemble 
by {a?i, . . . ,xj}, with Xj € M. N , 1 < j < J. We formulate 
a matrix representation X of the ensemble that features the 
signal xj in its j th row: X = [x\ ... x,j] T . The matrix X 
features the same structure as the matrix X obtained from a 
block-sparse signal; thus, the matrix X can be converted into 
a block-sparse vector x that represents the signal ensemble. 



B. Block-based approximation 

To pose the block-based approximation algorithm, we 
need to define the mixed norm of a matrix. 

Definition 12: The (p, q) mixed norm of the matrix X = 
[xi X2 ••■ xn] is defined as 

/ N \ 1 ll 

U\\ {P , q ) = [£lM2j • 

When q = 0, ||^||( Pl o) simply counts the number of nonzero 
columns in X. 

We immediately find that ||X|j( pp ) = ||x|| p , with x the 
vectorization of X. Intuitively, we pose the algorithm S(X, K) 



to obtain the best block-based approximation of the signal X 
as follows: 

Xf = arg min \\X - X\\ (2 2) s.t. \\X\\t 2 n) < K. (18) 

It is easy to show that to obtain the approximation, it suffices 
to perform column-wise hard thresholding: let p be the if" 1 
largest ^2-norm among the columns of X. Then our approx- 
imation algorithm is S(X,K) = Xf. = [x% 1 ■■■xf CN ], 
where 

.S _ ( Xn \\Xnh > Pi 

I \\x n \\ 2 < p, 

for each 1 < n < N. Alternatively, a recursive approximation 
algorithm can be obtained by sorting the columns of X by their 
li norms, and then selecting the columns with largest norms. 
The complexity of this sorting process is O (NJ + NlogN). 



C. Block-compressible signals 

The approximation class under the block-compressible 
model corresponds to signals with blocks whose l 2 norm has 
a power-law decay rate. 

Definition 13: We define the set of s-block compressible 
signals as 

S s = {X = [ Xl ... x N ] e R JxN such that 

||zx W || 2 < Gi- S -^ 2 , l<i<N,S<^}, 

where 1 indexes the sorted column norms. 

We say that X is an s-block compressible signal if X € & s . 
For such signals, we have \\X — Xk\\(2,2) = a s K { x ) ^ 
dK- 3 , and \\X - X K \\ (2A) < G 2 K x l d - s . Note that the 
block-compressible model does not impart a structure to the 
decay of the signal coefficients, so that the sets 1^j,K are equal 
for all values of j; due to this property, the (6s K , s)-RAmP 
is implied by the <Sjf-RIP. Taking this into account, we can 
derive the following result from [11], which is proven similarly 
to Theorem 4. 

Theorem 6: Let a; be a signal from the structured sparsity 
model S, and let y = Qx + n be a set of noisy CS 
measurements. If $ has the 5^-RIP with 6 S 4. < 0.1, then 
the estimate obtained from iteration i of block-based CoSaMP, 
using the approximation algorithm (18), satisfies 

\\x-Sih < 2-iM| 2 + 20^||X-X£|| (2i2) 

+ ^=\\X -X%\\ i2A) + \\n\\ 2 ) . 



Thus, the algorithm provides a recovered signal of similar 
quality to approximations of X by a small number of nonzero 
columns. When the signal x is if-block sparse, we have that 
\\X - -Xjr||(2,2) = \\X — -Xj<-||(2,i) = 0' obtaining the same 
result as Theorem 4, save for a constant factor. 
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D. Stable block-based recovery from compressive measure- 
ments 

Since Theorem 6 poses the same requirement on the 
measurement matrix $ for sparse and compressible signals, the 
same number of measurements M is required to provide per- 
formance guarantees for block-sparse and block-compressible 
signals. The class Sk contains S = (^) subspaces of 
dimension JK . Thus, a subgaussian random matrix has the 
property with constant 5g K an d probability 1 — e~* 
if the number of measurements obeys 

2 / / 2N 12 \ \ 

M ^wA K r^ +JlB srJ + r <19) 

To compare with the standard CS measurement bound, the 
number of measurements required for robust recovery scales 
as M = 0(JK + K\og(N/K)), which is a substantial 
improvement over the M — O ( JK \og(N/K)) that would 
be required by conventional CS recovery methods. When the 
size of the block J is larger than \og(N/K), then this term 
becomes O (KJ); that is, it is linear on the total sparsity of 
the block-sparse signal. 

We note in passing that the bound on the number of 
measurements (19) assumes a dense subgaussian measurement 
matrix, while the measurement matrices used in [10] have 
a block-diagonal structure. To obtain measurements from an 
M x JN dense matrix in a distributed setting, it suffices to 
partition the matrix into J pieces of size M x N and calculate 
the CS measurements at each sensor with the corresponding 
matrix; these individual measurements are then summed to 
obtain the complete measurement vector. For large J, (19) 
implies that the total number of measurements required for 
recovery of the signal ensemble is lower than the bound for the 
case where each signal recovery is performed independently 
for each signal (M = O {JK \og(N/K))). 

E. Experiments 

We conducted several numerical experiments comparing 
model-based recovery to CoSaMP in the context of block- 
sparse signals. We employ the model-based CoSaMP recovery 
of Algorithm 1 with the block-based approximation algo- 
rithm (18) in all cases. For brevity, we exclude a thorough 
comparison of our model-based algorithm with ^i-norm mini- 
mization and defer it to future work. In practice, we observed 
that our algorithm performs several times faster than convex 
optimization-based procedures. 

Figure 7 illustrates an N — 4096 signal that exhibits 
block sparsity, and its recovered version from M = 960 
measurements using CoSaMP and model-based recovery. The 
block size J = 64 and there were K = 6 active blocks in 
the signal. We observe the clear advantage of using the block- 
sparsity model in signal recovery. 

We now consider block-compressible signals. An exam- 
ple recovery is illustrated in Figure 8. In this case, the £ 2 - 
norms of the blocks decay according to a power law, as 



described above. Again, the number of measurements is far 
below the minimum number required to guarantee stable re- 
covery through conventional CS recovery. However, enforcing 
the structured sparsity model in the approximation process 
results in a solution that is very close to the best 5-block 
approximation of the signal. 

Figure 9(a) indicates the decay in recovery error as 
a function of the numbers of measurements for CoSaMP 
and model-based recovery. We generated sample block-sparse 
signals as follows: we randomly selected a set of K blocks, 
each of size J, and endow them with coefficients that follow 
an i.i.d. Gaussian distribution. Each sample point in the curves 
is generated by performing 200 trials of the corresponding 
algorithm. As in the connected wavelet-tree case, we observe 
clear gains using model-based recovery, particularly for low- 
measurement regimes; CoSaMP matches model-based recov- 
ery only for M > 5K. 

Figure 9(b) compares the recovery times of the two 
approaches. For this particular model, we observe that our 
proposed approach is in general much faster than CoSaMP. 
This is because of two reasons: a) the block-based approx- 
imation step involves sorting fewer coefficients, and thus is 
faster than K— term approximation; b) block-based recovery 
requires fewer iterations to converge to the true solution. 

VII. Conclusions 

In this paper, we have aimed to demonstrate that there are 
significant performance gains to be made by exploiting more 
realistic and richer signal models beyond the simplistic sparse 
and compressible models that dominate the CS literature. 
Building on the unions of subspaces results of [6] and the proof 
machinery of [11], we have taken some first steps towards 
what promises to be a general theory for model-based CS 
by introducing the notion of a structured compressible signal 
and the associated restricted amplification property (RAmP) 
condition it imposes on the measurement matrix $. Our 
analysis poses the nested approximation property (NAP) as a 
sufficient condition that is satisfied by many structured sparsity 
models. 

For the volumes of natural and manmade signals and 
images that are wavelet-sparse or compressible, our tree- 
based CoSaMP and IHT algorithms offer performance that 
significantly exceeds today's state-of-the-art while requiring 
only M = O (K) rather than M — O (Klog(N/K)) random 
measurements. For block-sparse signals and signal ensembles 
with common sparse support, our block-based CoSaMP and 
IHT algorithms offer not only excellent performance but also 
require just M = O (JK) measurements, where JK is the 
signal sparsity. Furthermore, block-based recovery can recover 
signal ensembles using fewer measurements than the number 
required when each signal is recovered independently; we have 
shown such advantages using real-world data from environ- 
mental sensor networks [44]. Additional structured sparsity 
models have been developed using our general framework 
in [45] and [46]; we have also released a Matlab toolbox 
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(a) original block-sparse signal (b) CoSaMP (c) model-based recovery 

(RMSE = 0.723) (RMSE = 0.015) 

Fig. 7. Example performance of structured signal recovery for a block-sparse signal, (a) Example block-sparse signal of length N = 4096 
with K — 6 nonzero blocks of size J — 64. Recovered signal from M = 960 measurements using (b) conventional CoSaMP recovery and (c) 
block-based recovery. Standard recovery not only recovers spurious nonzeros, but also attenuates the magnitude of the actual nonzero entries. 



(a) signal 



(b) best 5-block approximation 
(RMSE = 0.116) 



(c) CoSaMP 
(RMSE = 0.711) 



(d) model-based recovery 
(RMSE = 0.195) 



Fig. 8. Example performance of structured signal recovery for block-compressible signals, (a) Example block-compressible signal, length 
N — 1024. (b) Best block-based approximation with K — 5 blocks. Recovered signal from M — 200 measurements using both (c) 
conventional CoSaMP recovery and (d) block-based recovery. Standard recovery not only recovers spurious significant entries, but also attenuates 
the magnitude of the actual significant entries 




Fig. 9. Performance of CoSaMP vs. block-based recovery on a class of block-sparse signals, (a) Average normalized recovery error and (b) 
average runtime for each recovery algorithm as a function of the overmeasuring factor M/K. CoSaMP does not match the performance of the 
block-based algorithm until M = 5K. Furthermore, the block-based algorithm has faster convergence time than CoSaMP. 



containing the corresponding model-based CS recovery algo- 
rithms, available at http : / / dsp . rice . edu/ software. 

There are many avenues for future work on model-based 
CS. We have only considered the recovery of signals from 
models that can be geometrically described as a union of 
subspaces; possible extensions include other, more complex 
geometries such as high-dimensional polytopes and nonlinear 
manifolds. We also expect that the core of our proposed 
algorithms — a structured sparse approximation step — can 
be integrated into other iterative algorithms, such as relaxed 
£i-norm minimization methods. Furthermore, our framework 
will benefit from the formulation of new structured sparsity 



models that are endowed with efficient structured sparse 
approximation algorithms. 

Appendix A 
Proof of Theorem 2 

To prove this theorem, we will study the distribution of 
the maximum singular value of a submatrix $>t of a matrix 
with i.i.d. Gaussian entries $ corresponding to the columns 
indexed by T. From this we obtain the probability that RAmP 
does not hold for a fixed support T. We will then evaluate 
the same probability for all supports T of elements of Ti-j,K, 
where the desired bound on the amplification is dependent on 
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the value of j. This gives us the probability that the RAmP 
does not hold for a given residual subspace set TZj,K ■ We fix 
the probability of failure on each of these sets; we then obtain 
probability that the matrix $ does not have the RAmP using a 
union bound. We end by obtaining conditions on the number 
of rows M of $ to obtain a desired probability of failure. 

We begin from the following concentration of measure for 
the largest singular value ofaMxif submatrix $7% |T| = K, 
of anMxJV matrix $ with i.i.d. subgaussian entries that are 
properly normalized [25,47,48]: 



P (<w($t) > l + y-^+r + /3j <e 



-Mt 2 /2 



For large enough M, (3 -C 1; thus we ignore this small 
constant in the sequel. By letting r = i r \J\ + (r — 1 — 
(with the appropriate value of j for T), we obtain 



P (%ax($T) > fVT+^) < e 



We use a union bound over all possible Rj supports for u e 
IZj.K to obtain the probability that $ amplifies the norm of 
some u by more than j r \/\ + £k- 

P (||$u|| 2 > (j r VTT^) ||u|| 2 for some u e Hj, K ) 
Bound the right hand side by a constant fi; this requires 

< e h(VM(rvr+^-i)-VK) 2 ^ (20) 

for each j. We use another union bound among the residual 
subspaces IZj.K to measure the probability that the RAmP 
does not hold: 



< 



A' 
7a 7 



To bound this probability by e *, we need /i — jje *; 
plugging this into (20), we obtain 

< e Kv^O r v^+^-i)-v^) 2 ^ e -t 
3 ~ N 

for each j. Simplifying, we obtain that for $ to posess the 
RAmP with probability 1 — e - *, the following must hold for 
all j: 

2 



M > 



( J 2 (inML+tj+VK^ 



V 



(21) 



/ 



Since (y/a + Vb) 2 < 2a + 26 for a, b > 0, then the hypothesis 
(12) implies (21), proving the theorem. □ 



Appendix B 
Proof of Theorem 3 

In this proof, we denote M(x, K) = xk for brevity. To 
bound \\&(x — Xk)\\2, we write x as 

\N/K] 
X = X K + ^ XT 3 ' 



3=2 



where 



XT 3 = X jK - X(j_i) K ,j = 2,..., \N/K] 



is the difference between the best jK structured sparse approx- 
imation and the best (J— 1)K structured sparse approximation. 
Additionally, each piece xt^ € TZj,K- Therefore, since $ 
satisfies the (e^, s — l)-RAmP, we obtain 



\®(x-x K )h = 




[N/K] 

< E h**^ ii 2 



3=2 

3=2 

Since x G Wl s , the norm of each piece can be bounded as 

\\XTjh = \\ x iK - X(j-1)K 1 1 2 

< ||a; - X(j-i) K \\2 + \\x - x jK \\2 

< \ x \ ms K- s (u - iy s + r s ) . 

Applying this bound in (22), we obtain 

\N/K] 



< 



< 



< 



Vi + 




Vi + 


tK 


K s 


Vi + 






Vi + 


etc 



3=2 

x \m s 

3=2 
{N/K] 



+ 



(i-i) s ' 3 s 



^' 1 1 



^2* 1 
x\m s £ - + -, 

j=2 J ' y 



/i-r W*l 



J=2 



It is easy to show, using Euler-Maclaurin summations, that 
J2l=i K] r 1 < ln\N/K]; we then obtain 



\Mx - x K )h < (2 s + 1)VTT7k~K- s In 
which proves the theorem. 



N 
K 



\x\<m 3 , 



□ 



Appendix C 
Model-based Iterative Hard Thresholding 

Our proposed model-based iterative hard thresholding 
(IHT) is given in Algorithm 2. For this algorithm, Theorems 4, 
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(a) original (b) IHT (c) model-based IHT 

(RMSE = 0.627) (RMSE = 0.080) 

Fig. 10. Example performance of model-based IHT. (a) Piecewise 
smooth HeaviSine test signal, length N — 1024. Signal recovered 
from M = 80 measurements using both (b) standard and (c) model- 
based IHT recovery. Root mean-squared error (RMSE) values are 
normalized with respect to the £2 norm of the signal. 



5, and 6 can be proven with only a few modifications: $ must 
have the AI^-RIP with S M 3^ < 0.1, and the constant factor 
in the bound changes from 15 to 4 in Theorem 4, from 35 to 
10 in Theorem 5, and from 20 to 5 in Theorem 6. 



To illustrate the performance of the algorithm, we repeat 
the HeaviSine experiment from Figure 1. Recall that N = 
1024, and M — 80 for this example. The advantages of using 
our tree-structured sparse approximation step (instead of mere 
hard thresholding) are evident from Figure 10. In practice, we 
have observed that our model-based algorithm converges in 
fewer steps than IHT and yields much more accurate results 
in terms of recovery error. 

Appendix D 
Proof of Theorem 4 

The proof of this theorem is identical to that of the 
CoSaMP algorithm in [11, Section 4.6], and requires a set 
of six lemmas. The sequence of Lemmas 1-6 below are 
modifications of the lemmas in [11] that are restricted to 
the structured sparsity model. Lemma 4 does not need any 
changes from [11], so we state it without proof. The proof of 
Lemmas 3-6 use the properties in Lemmas 1 and 2, which are 
simple to prove. 

Lemma 1: Suppose <i> has A4-RIP with constant 8m- Let 
be a support corresponding to a subspace in A4. Then we 
have the following handy bounds. 



Il*n« 

11 4^ 



l(*n*n)~ 



< \/1 + <5m|H| 



< 

< 
> 

< 
> 



1 



{l + S M )\\u 
(1 - 5 M )\\u 



1 



1 



M|2, 



\ U \V2- 



0M 



Lemma 2: Suppose $ has Mj^-RlP with constant 8m 2 k - 
Let 17 be a support corresponding to a subspace in M.k, and 
let x E M K - Then ||$£$x| n c|| 2 < S M % \\ x \a° || 2 . 

We begin the proof of Theorem 4 by fixing an iteration 
i > 1 of model-based CoSaMP. We write x — Xi-i for the 



signal estimate at the beginning of the i th iteration. Define the 
signal residual s — x — x, which implies that s E We 
note that we can write r = y — &x = <p(x — x) + n = $s + n. 

Lemma 3: (Identification) The set £1 = supp(M2(e, K)), 
where e = $ T r, identifies a subspace in M. 2 K , and obeys 

IMnc|| 2 < 0.2223|| S || 2 + 2.34|M| 2 . 



Proof of Lemma 3: Define the set II = supp(s). Let 
en = M 2 (e,if) be the structured sparse approximation to e 
with support S7, and similarly let en be the approximation to 
e with support II. Each approximation is equal to e for the 
coefficients in the support, and zero elsewhere. Since ft is the 
support of the best approximation in M. 2 K , we must have: 





ll e - e d| 2 


< 


lie — e n ||l, 


N 






N 


71=1 


[n] - en[n}) 2 


< 


71=1 






< 


E e N 2 , 








n^n 


JV 






N 




2 E e N 2 


> 




71=1 






71=1 7i^n 




E e N 2 


> 


E e N 2 , 




nefi 




7ien 




E e N 2 


> 


E e M 2 , 




nen\n 




7ien\o 




ll e lo\nl| 2 


> 


l e n\n Ha; 



where Q \ II denotes the set difference of and IT. These 
signals are in Ai% (since they arise as the difference of two 
elements from M. 2 K )\ therefore, we can apply the A1^--RIP 
constants and Lemmas 1 and 2 to provide the following bounds 
on both sides (see [11] for details): 



l|e|n\nl|2 < ^M^IMh + \j^+^M^\\ n \W ( 22 ) 
||e|n\n||2 > (1 - o~m 2 k )ll s b c ll2 - $m 2 k \\ s h 

-^/l + S Ml \\n\\ 2 . (23) 

Combining (22) and (23), we obtain 



\\s\aPh < 



(5 M 2 k + 5 M i,)\\s\\2 + 2^l + 5 M 2 K \\n\\ 2 



1 



The argument is completed by noting that 5 M 2 r < Sj^i < 0.1. 

□ 

Lemma 4: (Support Merger) Let fl be a set of at most 
2K indices. Then the set A = ft U supp(x) contains at most 
3K indices, and ||o:|ac|| 2 < ||s|nc|| 2 . 

Lemma 5: (Estimation) Let A be a support corresponding 
to a subspace in M%, and define the least squares signal 
estimate b by b\x — G>i>y, b\r c = 0. Then 

||z-6|| 2 < 1.112||x| AO || 2 + 1.06||n|| 2 . 
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Algorithm 2 Model-based Iterative Hard Thresholding 



Inputs: CS matrix $, measurements y, structured sparse approximation algorithm 

Output: if -sparse approximation x to true signal x 

x = , d = y; i = {initialize} 

while halting criterion false do 

1. i <- i + 1 

2. b <— + <P T d {form signal estimate} 

3. Xi <— M(b, K) {prune residual estimate according to structure} 

4. d <— y - $Xi {update measurement residual} 
end while 

return x <— x. 



Proof of Lemma 5: It can be shown [11] that 

\\x - b\\ 2 < N A c|| 2 + ll(^A $ A) _1 *I*a;|nc||2 + ||$nn|| 2 . 

Since A is a support corresponding to a subspace in and 
x € M.k, we use Lemmas 1 and 2 to obtain 



lk-&||2 < ||a;| A c|| 2 H 



+ 



"2 



•■>M\ 



1-5 



Mi 



< 1 + 



J M\ 



1 - S Ml 



\x\nch + 



Finally, note that &m 3 k < S M 4_ < 0.1. 



□ 



Lemma 6: (Pruning) The pruned approximation Xi — 
M(b, K) is such that 

\\x- Xl \\ 2 < 2||a;-6|| 2 . 



Proof of Lemma 6: Since Xi is the best approximation in M K 
to b, and x £ Mk, we obtain 



\\x - Xi\\ 2 < \\x - b\\ 2 + \\b- Xi\\ 2 < 2\\x - b\\ 2 . 



□ 



We use these lemmas in reverse sequence for the inequal- 
ities below: 

\\x-Xi\\ 2 < 2||2B — &||a, 

< 2(1.112||x| A c|| 2 + 1.06||n|| 2 ), 

< 2.224||s| n c|| 2 + 2.12||n|| 2 , 

< 2.224(0.2223||s|| 2 + 2.34||n|| 2 ) + 2.12||n|| 2 , 

< 0.5||s|| 2 + 7.5||n|| 2 , 

< 0.5||o; — ||a + 7.5||n|| 2 . 

From the recursion on x i7 we obtain \\x — Xi\\ 2 < 2~ 4 ||a;|| 2 + 
15||n|| 2 . This completes the proof of Theorem 4. □ 

Appendix E 
Proof of Proposition 1 

When K < log 2 N, the number of subtrees of size K of 
a binary tree of size N is the Catalan number [49] 



using Stirling's approximation. When K > log 2 N, we parti- 
tion this count of subtrees into the numbers of subtrees tx,h 
of size K and height h, to obtain 

log 2 W 

Tr,n = ^ tx,h 

We obtain the following asymptotic identity from [49, page 
51]: 



4 *+i.b 

t K ,h - -a 2^ 



/l 4 



m>l 



9K 

— (27rm) 4 - 3(27rm) 2 



K(27rm) 2 

e h 2 



< 



4^+2 



+ 4 k O 



h 5 



E 

m>l 



IK 

— (2^m) 4 - 3(2^m) 2 



h 4 J ' 



e h 2 — . (24) 



We now simplify the formula slightly: we seek a bound 
for the sum term (which we denote by (3h for brevity): 



& = E 



m>l 



-^-(27rm) 4 -3(27rm) 2 



g(27TTn)^ 

e h 2 



< E|mv 



Let m n 



m>l 



K(27rm)^ 
j? 



(25) 



, the value of m for which the term inside 



max ~~ ttVW 

the sum (25) is maximum; this is not necessarily an integer. 
Then, 



LmmaxJ-l 



2K 



K(2im) i 
K* 



m— 1 



2if 



— (2nm) i e K 



1 



< 



h 2 

m=L m maxJ 

+ E -tfV™) e 

m> [m max ] +1 

[m max j K(2.x) 2 

— -(27tx) e rfx 



^ 2 



2^ 



— (2nm) 



<-K,N 



1 (2K\ (2e) K 



K+ 1 V K J ~ K+ 1' 



m— L^TlmaxJ 

/*°° 



e fc 2 <ix, 
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where the second inequality comes from the fact that the series 
in the sum is strictly increasing for m < L™maxJ and strictly 
decreasing for m > |~r7j max ] • One of the terms in the sum can 
be added to one of the integrals. If we have that 

1 • & < (2tt rm max l) 4 e 



(2tt [m max j ) e 
then we can obtain 



(26) 



It is easy to show, using Euler-Maclaurin summations, that 

b b b I 

V j' 1 < In and V j~ 2 < -; 

^ J ~ a- 1 ^ ~ a- 1 

we then obtain 



j=a 



rm max ] 2x 

2K , _ , , A k(2^\ 

m max I ) 



' K,N 



h 2 



(2irx) 4 e ^ dx 



< 



< 



6 



:hl 



log 2 iV 



+ 



128 



K [\og 2 K\ e 2 Llog 2 ifJ 



A K+A 



< 



4K+4 



+ -^-(2tt r™ max l) 4 e 

— (27rx) 4 e T? dx. 
When the opposite of (26) is true, we have that 



L»™maxJ 



2K 



(2nxft 



K (2ttx) 2 



'dx 



2K , ... K(2t, [m m „j ) 2 

+ -^-(27r |ro max J) e — h 3 

2K , n4 iC(2xx) 2 

— —\2-kx) e h 1 dx. 

'L"lmaxJ " 

Since the term in the sum reaches its maximum for m max , 
will have in all three cases that 

f°° 2K . , 4 k( 2j x) 2 
0H< J i ^(^xfe ^^dx+—. 

We perform a change of variables u = 2ttx and define a 
h/V2K to obtain 

8h 2 



Ke 2 [log 2 K\ ~ Ke 2 ' 
This proves the proposition. □ 



Appendix F 
Proof of Proposition 3 

We wish to find the value of the bound (12) for 
the subspace count given in (17). We obtain M > 

maxi<j<^ N / K -\ Mj, where 



we 



2X + 41n- 



2t 



K(Kj + l)(Kj + K + l) 
We separate the terms that are linear on K and j, and obtain 

1 



Ph < ±- I \u i e~^l^ 2 dx + 
2vr J Q a 2 



Mi = 



1 



Ke 2 

2 /o_2 



3 ' ovr+^-i) ! 



X(3 + 41n2) + 8 J ft:j(l + ln2) 



< __ f -L- u*e-» 2 ^ 2 dx+ 8h2 

2(TV27r J-oo \J2-kg 



+4 In 



N 



Ke 2 ' 



K(Kj + l)(Kj + K + l) 
1 



+ 2t 



Using the formula for the fourth central moment of a Gaussian 
distribution: 



(j s -°- 5 VTT^-r - 5 ) 

#(3 + 4 In 2) 



we obtain 



0h < 



— u 

00 \2lT(J 

3a 3 8h 2 



4 e-« ' 2 ° dx = 3a 4 , 



8K(1 +ln2) + 



■In 



N 



3h 3 8h 2 
+ 



2V2^ Ke 2 8V^ Ke2 ' 
Thus, (24) simplifies to 

A K ( 6 128 

tK,h < 



K \hV^K h 2 e 2 
Correspondingly, T K>N becomes 



2t 

j ~ + i)(#j + icTT) + 7 

The sequence {Mj}^^ is a decreasing sequence, since the 
denominators are decreasing sequences whenever s > 0.5. We 
then have 



M > 



1 



K(ll + 12 In 2) 



Tk,n < 



< 



log 2 AT K 

y — 



h=Llog 2 K\+l 

4^ / (, 

K 



6 128 
+ 



# \hV^K h 2 e 2 



+41n- 



TV 



2f 



□ 



-/V — // 



+ - 



128 



log 2 JV 

E 

/»=Llog 2 K\ + l 



1 
X 2 " 



#(K+ 1)(2# + 1) 
This completes the proof of Proposition 3. 
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